Arabic Phonetic Dictionaries for Speech Recognition

نویسندگان

  • Mohamed Ali
  • Moustafa Elshafei
  • Mansour Al-Ghamdi
  • Husni Al-Muhtaseb
چکیده

Phonetic dictionaries are essential components of large-vocabulary speaker-independent speech recognition systems. This paper presents a rule-based technique to generate phonetic dictionaries for a large vocabulary Arabic speech recognition system. The system used conventional Arabic pronunciation rules, common pronunciation rules of Modern Standard Arabic, as well as some common dialectal cases. The paper gives in detail an explanation of these rules as well as their formal mathematical presentation. The rules were used to generate a dictionary for a 5.4 hour corpus of broadcast news. The rules and the phone set were tested and evaluated on an Arabic speech recognition system. The system was trained on 4.3 hours of the 5.4 hours of Arabic broadcast news corpus and tested on the remaining 1.1 hours. The phonetic dictionary contains 23,841 definitions corresponding to about 14232 words. The language model contains both bi-grams and tri-grams. The Word Error Rate (WER) came to 9.0%. DOI: 10.4018/jitr.2009062905 IGI PUBLISHING This paper appears in the publication, Journal of Information Technology Research, Volume 2, Issue 4 edited by Mehdi Khosrow-Pour © 2009, IGI Global 701 E. Chocolate Avenue, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-global.com ITJ 5297 68 Journal of Information Technology Research, 2(4), 67-80, October-December 2009 Copyright © 2009, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. researchers to catch up with the progress of the ASR technology in the other languages. One of the key components of the modern large-vocabulary speech recognition systems is the pronunciation or phonetic dictionary. This dictionary serves as an intermediary between the Acoustic Model and the Language Model in speech recognition systems. It contains a subset of the words available in the language and the pronunciation of each word in terms of the phonemes or the allophones available in the acoustic model. For instance, the CMU dictionary for North American English contains over 125,000 words and their transcriptions (CMU, 2008). The format of this dictionary is particularly useful for speech recognition and synthesis, as it has mappings from words to their pronunciations in the given phoneme set The current phoneme set contains 39 English phonemes, for which the vowels may also carry lexical stress. Because of the large number of pronunciation exceptions in English, this dictionary was essentially built manually by experts over many years. On the other hand, pronunciation of Arabic text follows specific rules when the text is fully diacritized. Many of these pronunciation rules can be found in Elshafei (1991), and Alghamdi el. al. (2004). The statistical approach for speech recognition (Huang etal, 2001; Jelinek, 1998; Rabiner & Juang, 1993) has virtually dominated Automatic Speech Recognition (ASR) research over the last few decades, leading to a number of successes (Lee, 1988; Soltau et al, 2007; Stallard et al., 2008; Young, 1997; Zhou et .al, 2003). The statistical approach is dominated by the powerful statistical technique called Hidden Markov Model (HMM) (Rabiner 1989). The HMM-based ASR technique allowed to build many successful applications that depend on large vocabulary speaker-independent continuous speech recognition. The HMM-based technique essentially consists of recognizing speech by estimating the likelihood of each phoneme at contiguous, small frames of the speech signal (Huang et al., 2001; Rabiner & Juang, 1993). Words in the target vocabulary are modeled into a sequence of phonemes, and then a search procedure is used to find, amongst the words in the vocabulary list, the phoneme sequence that best matches the sequence of phonemes of the spoken word. Two notable successes in the academic community in developing high performance large vocabulary speaker independent speech recognition systems are the HMM tools, known as the HTK tool kit, developed at Cambridge University, (HTK, 2007), and the Sphinx system developed at Carnegie Mellon University (Huang et al., 1993; Lamere et al, 2003; Noamany et al., 2007; Placeway et al., 1997). Development of an Arabic speech recognition is a multi-discipline effort, which requires integration of Arabic phonetic (Alghamdi, 2000; Alghamdi et al, 2004), Arabic speech processing techniques (Elshafei, 1991; Elshafei et al., 2002), and Natural languages processing (Elshafei et al., 2006). Development of Arabic speech recognition systems has recently been addressed by a number of researchers (Elshafei. et al., 2008; Hiyassat, 2007; Noamany et al, 2007; Soltau et al., 2007). Saroti et al. (2007) used Sphinx tools for Arabic speech recognition. They demonstrated the use of the tools for recognition of isolated Arabic digits. The data was recorded from 6 speakers. They achieved digits recognition accuracy of 86.66%. Hiyassat (2007), in his Ph.D. thesis, developed a tool to generate Arabic pronunciation dictionaries. The generated dictionaries are based on a small MSA speech corpus consisting of digits or command and control vocabulary. A workshop was held in 2002 at John Hopkins University (Kirchhoff et al., 2003) to define and address the challenges in developing a speech recognition system using Egyptian dialectic Arabic for telephone conversations. They proposed to use Romanization method for transcription of the speech corpus. Billa et al. (2002) addressed the problems of indexing of Arabic news broadcast, and discussed a number of research issues for Arabic speech recognition. Journal of Information Technology Research, 2(4), 67-80, October-December 2009 69 Copyright © 2009, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. Further research in Arabic morphology was performed by Krichhoff et al (2006). They represented four different approaches for Arabic language modeling and introduced a novel technique called factored language models. Xiang et al. (2006) also investigated algorithms to separate words and affixes in language models. Afify et al. (2006) proposed a word decomposition morphological language model to improve recognition rates for Iraqi dialect. Messaoudi et al. (2006) investigated the problem of generating phonetic dictionaries and the effect of using morphological rules to generate pronunciations for huge databases of more than 1 million words. Gales et al. (2007) studied the problem of generating phonetic dictionaries, while focusing on the effect of multiple pronunciations on recognition quality. Their research emphasizes on the inclusion of unsupervised training data as a way to improve the overall system accuracy. An enhancement to that effort was done in (Diehl et al, 2008) where a multi-phase pronunciation generation is performed, with expert rules that cover cases that can’t be captured with morphological analyzers. Due to the significant increase in available Arabic speech data, recent research on developing complete Automatic Arabic Speech Recognition (AASR) systems has become significant, with efforts from IBM (Soltau et al, 2007) and CMU/Interact group (Noamany et al, 2007). Both projects are parts of the GALE program (Gale, 2008) supported by DARPA. Both research teams highlighted the importance of speaker adaptation in improving recognition quality. This paper addresses the phonetic dictionary component of Arabic speech recognition systems. We provided detailed rules for automatic generation of the Arabic phonetic dictionaries and described the evaluation of these rules using an Arabic broadcast news speech recognition system. In Section 2, we discuss the Arabic phoneme set of choice. Then, in Section 3 we describe methodology and formulation of the rules for generating the phonetic dictionary. Section 4 discusses in details the set of developed rules and their implications. Finally, in Section 5, we present an evaluation of the rule set by generating various test cases of the phonetic dictionary and compare the recognition results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phonetic tool for the Tunisian Arabic

A phonetic dictionary is an essential component of a speech recognition system or a speech synthesis system. Our work targets the generation of an automatic pronunciation dictionary for the Tunisian Arabic, in particular in the field of rail transport. To do this, we created two tools of phonetic vowelized and unvowelized words in the Tunisian Arabic. The proposed method to automatically genera...

متن کامل

Wiktionary as a source for automatic pronunciation extraction

In this paper, we analyze whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process. As a representative dictionary, we selected Wiktionary [1] since it is at hand in multiple languages and, in addition to the definitions of the words, many...

متن کامل

Generating Non-Native Pronunciation Lexicons by Phonological Rules

This paper presents a new approach to model prototypical foreign-accented pronunciation variants on the phonetic transcription level using rewrite rules. For each native language (L1) and target language (L2) pair, a set of postlexical rules is designed to transform canonical phonetic dictionaries of L2 into adapted dictionaries for native L1 speakers. Potential applications are speech recognit...

متن کامل

Dictionary learning: performance through consistency

We present rst results from our e orts in automatically increasing and adapting phonetic dictionaries for spontaneous speech recognition. Spontaneous speech adds a variety of phenomena to a speech recognition task: false starts [1], human and nonhuman noises [2], new words [3] and alternative pronunciations. All of these phenomena have to be tackled when adapting a speech recognition system for...

متن کامل

Fusion of dictionaries in voice creation and speech synthesis task

The accurate phonetic transcription is very important for different fields of speech technologies. In speech synthesis, it is the benchmark for the voice segmentation, and therefore one of the crucial points for the synthesized speech pronunciation quality. In ASR the availability of matching phonetic transcription allows a higher recognition precision. Use of different dictionaries could impro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JITR

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009